Remove `PDFPageView.updatePosition` since it's not actually necessary #8531

Snuffleupagus · 2017-06-15T11:30:46Z

This method is currently called from PDFViewer._scrollUpdate on every scroll event in the viewer.

However, I cannot see why this code is now necessary (assuming that it once was), since text-selection and searching still works exactly the same way with this patch as with the current master.

When PDFPageView.updatePosition is called, the page can be in either of these states:

The page hasn't been rendered, in which case the textLayer property doesn't exist yet.
The page is currently rendering, meaning that the textLayer property exists. Given that the textContent won't be fetched until the page has been successfully rendered, TextLayerBuilder.render will return immediately and effectively be a no-op (since there's nothing to render yet).
The has been been rendered, and the textLayer is currently rendering.
The page, and its textLayer, has been completely rendered. In this case, TextLayerBuilder.render will return immediately and effectively be a no-op.

Here, only the third case seem to require any further analysis:
When scrolling occurs while the textLayer is rendering, TextLayerBuilder.render will via a helper method call TextLayerRenderTask.cancel (in src/display/text_layer.js) to stop processing.
However, due to the run-to-completion nature of JavaScript, once TextLayerRenderTask._render has been invoked appendText will always run.[1]

So even though we cancel rendering of pending textLayers during scrolling, via the repeated TextLayerBuilder.render calls from within the PDFPageView.updatePosition method, that does not prevent us from running the code inside of TextLayerRenderTask._render over and over for the same page; which all seems very inefficient to me.[2]

All this will thus have the effect of delaying the actual rendering of a textLayer ever so slightly while scrolling in the viewer. However, it does so at the expense of potentially hundreds of unnecessary appendText calls.[3]

Hence it seems to me that it's less resource intensive overall to simply let rendering of the textLayer complete, once it has started. Obviously, we still abort all rendering of a page, and its textLayer, when it's being destroyed (e.g. by being evicted from the page cache).

In case that there's any worry that the patch could affect e.g. highlighting of search results, please note that the existing code in TextLayerBuilder.render already calls updateMatches when the TextLayerTask resolves successfully.

I'm sorry that this became quite long, but to try and summarize:
PDFPageView.updatePosition doesn't actually do anything in most cases. In the one case where it matters, it seems that it's actually doing more harm than good; which is why I'm proposing that we just remove it.

[1] Although we may be able to skip the render call, provided that it happens after a timeout (as is the case in the default viewer).
[2] With current work being done to support streaming of TextContent, we'd also need to add just as many duplicate API calls to PDFPageView.updatePosition.
[3] The number of duplicate appendText calls is directly proportional not only to the scroll speed, but also to the number of pages in the document.

This method is currently called from `PDFViewer._scrollUpdate` on *every* scroll event in the viewer. However, I cannot see why this code is now necessary (assuming that it once was), since text-selection and searching still works *exactly* the same way with this patch as with the current `master`. When `PDFPageView.updatePosition` is called, the page can be in either of these states: 1. The page hasn't been rendered, in which case the `textLayer` property doesn't exist yet. 2. The page is currently rendering, meaning that the `textLayer` property exists. Given that the `textContent` won't be fetched until the page has been successfully rendered, `TextLayerBuilder.render` will return immediately and effectively be a no-op (since there's nothing to render yet). 3. The has been been rendered, and the `textLayer` is currently rendering. 4. The page, and its `textLayer`, has been completely rendered. In this case, `TextLayerBuilder.render` will return immediately and effectively be a no-op. Here, only the *third* case seem to require any further analysis: When scrolling occurs while the `textLayer` is rendering, `TextLayerBuilder.render` will via a helper method call `TextLayerRenderTask.cancel` (in src/display/text_layer.js) to stop processing. However, due to the run-to-completion nature of JavaScript, once `TextLayerRenderTask._render` has been invoked `appendText` will always run.[1] So even though we cancel rendering of pending `textLayer`s during scrolling, via the repeated `TextLayerBuilder.render` calls from within the `PDFPageView.updatePosition` method, that does *not* prevent us from running the code inside of `TextLayerRenderTask._render` over and over for the *same* page; which all seems *very* inefficient to me.[2] All this will thus have the effect of delaying the *actual* rendering of a `textLayer` ever so slightly while scrolling in the viewer. However, it does so at the expense of potentially hundreds of unnecessary `appendText` calls.[3] Hence it seems to me that it's less resource intensive overall to simply let rendering of the `textLayer` complete, once it has started. Obviously, we still abort all rendering of a page, and its `textLayer`, when it's being destroyed (e.g. by being evicted from the page cache). In case that there's any worry that the patch could affect e.g. highlighting of search results, please note that the existing code in `TextLayerBuilder.render` already calls `updateMatches` when the `TextLayerTask` resolves successfully. *I'm sorry that this became quite long, but to try and summarize:* `PDFPageView.updatePosition` doesn't actually do anything in *most* cases. In the one case where it matters, it seems that it's actually doing more harm than good; which is why I'm proposing that we just remove it. --- [1] Although we may be able to skip the `render` call, provided that it happens *after* a `timeout` (as is the case in the default viewer). [2] With current work being done to support streaming of `TextContent`, we'd also need to add just as many duplicate API calls to `PDFPageView.updatePosition`. [3] The number of duplicate `appendText` calls is directly proportional not only to the scroll speed, but also to the number of pages in the document.

yurydelendik · 2017-06-15T12:35:12Z

/botio-windows preview

pdfjsbot · 2017-06-15T12:35:13Z

From: Bot.io (Windows)

Received

Command cmd_preview from @yurydelendik received. Current queue size: 0

Live output at: http://54.215.176.217:8877/4ae586d55906d3b/output.txt

pdfjsbot · 2017-06-15T12:54:42Z

From: Bot.io (Windows)

Success

Full output at http://54.215.176.217:8877/4ae586d55906d3b/output.txt

Total script time: 19.48 mins

Published

yurydelendik · 2017-06-15T17:09:31Z

I was looking to the old code (fcdf266) and I think this is the right move now. It was a problem back then due to non-optimized text layer building and lots of divs.

If we will feel we regress div heavy text layers, we will add "reset timeout" functionality to match the behavior of the original 5-year old commit.

yurydelendik · 2017-06-15T17:09:44Z

Thank you for the patch.

Remove `PDFPageView.updatePosition` since it's not actually necessary

Snuffleupagus added the viewer label Jun 15, 2017

yurydelendik merged commit 2097513 into mozilla:master Jun 15, 2017

yurydelendik mentioned this pull request Jun 15, 2017

Streams get text content #8488

Merged

4 tasks

Snuffleupagus deleted the rm-updatePosition branch June 16, 2017 13:55

movsb pushed a commit to movsb/pdf.js that referenced this pull request Jul 14, 2018

Merge pull request mozilla#8531 from Snuffleupagus/rm-updatePosition

3c89def

Remove `PDFPageView.updatePosition` since it's not actually necessary

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `PDFPageView.updatePosition` since it's not actually necessary #8531

Remove `PDFPageView.updatePosition` since it's not actually necessary #8531

Snuffleupagus commented Jun 15, 2017

yurydelendik commented Jun 15, 2017

pdfjsbot commented Jun 15, 2017

pdfjsbot commented Jun 15, 2017

yurydelendik commented Jun 15, 2017

yurydelendik commented Jun 15, 2017

Remove PDFPageView.updatePosition since it's not actually necessary #8531

Remove PDFPageView.updatePosition since it's not actually necessary #8531

Conversation

Snuffleupagus commented Jun 15, 2017

yurydelendik commented Jun 15, 2017

pdfjsbot commented Jun 15, 2017

From: Bot.io (Windows)

Received

pdfjsbot commented Jun 15, 2017

From: Bot.io (Windows)

Success

Published

yurydelendik commented Jun 15, 2017

yurydelendik commented Jun 15, 2017

Remove `PDFPageView.updatePosition` since it's not actually necessary #8531

Remove `PDFPageView.updatePosition` since it's not actually necessary #8531